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ABSTRACT 

Some  probability  models  for  classifying  individuals  as  belonging  to  one 
of  two  or  more  populations  using  scale  invariant  discriminant  functions  are 
considered.  The  investigation  is  motivated  by  practical  situations  where  the 
observed  data  on  an  individual  are  in  the  form  of  ratios  of  some  basic  measure¬ 
ments  or  measurements  scaled  by  an  unknown  non-negative  number.  The  probability 

models  are  obtained  by  considering  a  p-vector  random  variable  X  with  a  known 

,  J 

distribution  and  deriving  the  distribution  of  the  random  vector  Y  =  [G(X)]  X, 
where  G(X)  is  a  non-negative  measure  of  size  such  that  G(1X)  =  XG(X)  for  \>0. 
Explicit  expressions  are  obtained  for  the  densities  of  what  are  called  Angular 
Gaussian,  Compositional  Gaussian,  Type  1  and  Compositional  Gaussian,  Type  2 
distributions.  . . - 

Key  words:  Angular  Gaussian  Distribution,  Compositional  Gaussian  Distribution, 
Discriminant  function.  Scale  free  methods. 
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1.  itsrmoDucTiON 

We  consider  the  problem  of  classifying  an  individual  as  belonging  to  one  of  two  or 
more  populations  using  scale  invariant  discriminant  functions.  The  investigation  is  motivated 
by  practical  situations  where  the  observed  data  on  an  individual  are  in  the  form  of  ratios 
of  some  basic  measurements  or  measurements  scaled  by  an  unknown  non-negative  number. 

In  this  paper  we  derive  some  probability  models  for  applications  to  such  data. 

If  X’  *  )  is  a  vector  of  p  basic  measurements  which  may  be  known  apart 

I  P 

from  a  positive  scaling  factor,  then  we  may  consider  transformed  measurements 


(yi.*“.yp>*  “  Y  =  [G(x)r  X 

which  are  scale  free  if  G  is  some  non-negative  measure  of  size  such  that  G(XX) 
for  X  >  0.  Some  typical  examples  of  G(X)  are 

G(X)  -  ||X||  - 


(1.1) 


=  XG(X) 


(1.2) 


(1.3) 


We  call  the  corresponding  transformed  variables  Y  =  X/  ||X||  ,  X/|J|xJ  as  directional,  and 
compositional  data  respectively.  We  note  ^at  the  term  compositional  data  is  usually 
applied  to  a  set  of  non-negative  proportions  (see  Aitchison  (1985)),  but  our  definition  is 
more  flexible.  However,  we  refer  to  Y  =  X/|J]x,  j  as  compositional  data  of  type  1  and  Y 
=  X/J|x  as  of  type  2,  even  when  x.  are  not  non-negative. 


It  is  also  interesting  to  note  that  when  we  have  compositional  data  with  non-  ' 

negative  proportions,  (yj^,***,yp)  such  that  Jy^  =  1,  then  we  may  transform  them  into 
directional  data  by  considering  (r^^,***,*^  )  and  use  appropriate  probability  models 
for  directional  data  (with  non-negative  components)  for  statistical  analysis  as  suggestecj 
by  Stephens  (1982). 

One  way  of  generating  probability  models  for  directional  and  compositional  data  is  to  ^ 

consider  a  probability  distribution  for  the  basic  measurements  X  and  then  derive  the 
induced  distribution  for  Y  =  [G(X)]  ^X.  In  this  paper  we  assume  that  X'*'N  (;i,D,  i.e.,  as  'I 

p 


2 


p-variate  normal  with  mean  vector  )jl  and  variance-covariance  matrix  Z,  and  derive  the 
distribution  of  Y  for  different  size  functions  G. 

Once  an  appropriate  probability  model  is  chosen,  the  problem  of  discrimination  can 
be  handled  in  the  usual  way. 

We  also  comment  on  non-parametric  methods  for  estimation  of  density  for 
directional  and  compositional  data. 

2.  CLASSES  OF  DISTRIBUTIONS  FOR  DIRECTIONAL  DATA 


2.1  Angular  Gaussian  Distribution  (AGD) 

Let  and  define  Y  =  R  'x  where  R  =  ||X||  =  (X'X)'^^,  so  that  Y  is  the 

vector  of  direction  cosines  with  the  condition  Y’Y  =  1.  The  marginal  distribution  of  Y  on 
the  p-dimensional  unit  sphere  is  called  the  AGD  (Angular  Gaussian  Distribution).  For  the 

p 

special  case  when  Z  =  o^»  Bingham  obtained  the  distribution  of  Y  in  the  form  of  an 
infinite  series  (see  Watson  (1983),  p.  226).  In  this  section,  we  obtain  the  distribution  of  Y 
in  the  general  case  in  a  closed  form  involving  a  finite  number  of  terms. 

Consider  a  polar  transformation  from  X  to  (R,9 ),  where  ®  is  a  vector  of  p- 1  angles 
in  which  case  the  Jacobian  is  of  the  form 


^^=RP-'f(0). 

D(R,0) 


(2.1) 


The  transformed  p.d.f.  (probability  density  function)  is 


2iTEr^RP"^f(e)exp[-2"^(R^Q^-2R02-H)^)l 


(2.2) 


where 


and  Y  is  a  function  of  0  only.  Changing  from  R  to  r  =  the  p.d.f.  (2.2)  transforms  to 


I 2t7E  I  0)exp[-2" \q  -Q^q‘ 

'  '  3  12  3 


X  r'’  ’exp[-2  ’(r-Q^Q^ . 


(2.3) 


Integrating  out  for  r  from  0  to  ",  the  p.d.f.  at  Y  w.r.t.  to  the  surface  element 

d(ii  on  fl  is 
P  P 

p(yIu,Z)  =  l27rZl'^Q3’P^^Ip(Q2Q^)  exp [-2"^Q^-Q2Q3h ]  (2.4) 

where 


I  (a)  =  C  rP"^exp(-2"^(r-a)^]dr. 

P  Jo 


The  function  1  satisfies  the  recurrence  relation 
p 

Ip(a)  =  (p-2)  Ip_2(a)  +  a  Ip_j(a)  for  p>2 


(2.5) 


(2.6) 


with  the  initial  values 


12(01) 


-a^/2 


e 


+  alj(a). 


(a) 


$(c() 


where  ♦(«)  is  the  distribution  function  of  N,(0,1). 
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It  may  be  noted  that  the  p.df.  (2.4)  remains  unchanged  if  ^  and  Z  are  replaced  by 
2 

ati  and  a  Z  for  any  a>0.  However,  we  can  make  the  parameters  unique  by  imposing  the 
condition  l||v|I  =  1- 

2.2  Longevin  Distribution 

We  can  generate  other  distributions  for  Y  from  (2.2)  and-  (2.3)  as  suggested  by 
Fisher  (1953)  by  considering  conditonal  instead  of  marginal  distributions.  Thus,  from  the 
expression  (2.3),  the  conditional  p.cLf.  of  Y  on  given  r  =  1  is 

p 

const. 

-  const.  (Y'E“^Y)"P^^exp(y'E“^Y//^'E"^Y)  (2.7) 

where  we  may  impose  the  restriction.  ||Tji|f  =  1-  When  Z  =  we  have  the  Longevin 
(1905)  -  von  Mises  (1918)  -  Fisher  (1953)  distribution 

const  exp<v’Y.-.  (2.8) 

on  the  surface  of  a  p-dimensional  sphere. 

From  the  exores.'^ion  (2L2),  we  find  that  the  conditional  p.df.  of  Y  on  given  R  =  1 

p 

with  respect  to  is 

const.  expE-2  '(Y— ii)’Z  V— p)]  (2.9) 

where  .I|lJi|[|  =  1- 

We  add  two  other  classes  of  distributions  found  to  be  useful  in  practical 
applications  as  possible  models  for  directional  data 


(2.10) 


Scheldeggar-Watson  p.d.f,  [bpCx)]  ^exp  (k’Y)  . 

Bingham  p.d.f.  [b(k)]  expY’KY,  where  K  is  pxp  symmetric  matrix.  (2.11) 


2.3  Estimation  of  Parameters 

The  model  (2.4),  which  is  the  angular  normal  distribution,  can  be  used  to  construct 
scale  invariant  discriminant  functions  provided  the  parameters  p  and  2  are  known.  If  they 
are  unknown  we  may  have  to  estimate  them  from  past  observations  Y^  on  Y.  Using 

the  density  function  (2.4).  the  likelihood  based  on  past  data  is 

n  p(Y.|ii,2) 

i-1 

with  the  restriction  ||^||  =  1.  The  method  of  maximum  likelihood  for  estimation  of 

parameters  can  be  implemented  without  much  difficulty  since  the  derivatives  of  all  the 
expressions  involved  in  (2.4)  with  respect  to  y  and  Z  can  be  easily  evaluated  However, 
there  are  too  many  parameters  to  be  estimated  and  a  very  large  sample  may  be  necessary 
to  obtain  reasonably  good  estimators. 

We  may  consider  an  alternative  method  by  considering  the  marginal  bivariate 

distributions  of  Y,  where  Y  -  (x, /|[x[!  ,.»»,x  /I|xII )  and  X  -  N  (p.J:).  If  y,  and  y, 

i  n  p  1  ^ 

are  the  first  two  components  of  Y,  then  it  is  easily  seen  that 

*  P(yj  1  ay2)  =  P(x^-ax2  ^  0) 

=  (au2-P^)/(a^a22“2aa^2+^ll^^l (2.13) 


where  ♦  is  the  distribution  function  of  N^(0, 1).  If  we  have  a  sample  of  size  n  on  Y  with 

the  first  two  components  (y,.,y-.),  i  =  l,***,n,  we  can  estimate  p  for  any  given 

zi  ct 
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=  proportion  of  i  s  such  that  <  ay^ 


(2.14) 


Then  we  have  the  observational  equations 


2  M2  ^ 


or 


2  - 1  2  2 

iOLV-^-V^)  =  [4»  (a  022~^“°12'^°1 


(2.16) 


p(p-1)/2  families  of  equations  of  the  kind  (2.15)  or  (2.16)  are  available  involving  all  the 

elements  of  y  and  T  by  considering  every  pair  of  components  in  Y  From  the  equations 

(2  15)  or  (2.16)  it  is  clear  that  only  ratios  of  the  parameters  can  be  estimated  They  can 
be  made  unique  by  using  a  restriction  like  llfill  =1.  An  appropriate  method  may  be 

used  to  combine  the  equations  (2.15)  or  (2.16)  to  produce  the  requisite  number  of 

consistent  equations  to  estimate  the  parameters. 


We  describe  one  of  the  methods.  First,  we  note  that  by  smoothing  p^  in  (2  1 5),  we 


can  estimate  a  ,  a  ,  a  such  that  p  =  1/4,  p  =1/2  and  p  =  3/4  Writing 
- 1  0  1  a  a  a 

1  0  1 


r^l/4)  =  q_j  ,  ^  '(1/2)  =0,0  '(3/4)  =  qj 


(2.17) 


the  equations  (2  1 6)  for  a  .  a  .  a  can  be  written  as 

^  - 1  0  1 


(2.18) 


-  -  2  2  2 

(a  y^-U.)  =  q  (rt  n  -2™_t  +o  ),  s 

s2  1  s  s22  sl2  11 


=  -1,1. 


(2.19) 


There  are  p  equations  of  the  kind  (2  18)  obtained  by  considering  all  pairs  of  the 


V  .%  .'w  , 
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components  of  Y.  They  yield  estimates  of  the  ratios  of  u  . u  ,  which  can  be  standardised 

1  p 

to  satisfy  the  restriction  .||y||  =  1,  Then  we  have  p(p-l)  equations  of  the  type  (2.19) 
involving  the  p(p+1)/2  parameters  in  Z.  Observing  that  the  equations  are  linear  in  o  ,  we 

ij 

may  combine  them  by  least  squares  method  to  produce  p(p+1)/2  equations  by  solving 
which  we  obtain  the  estimates  of  o  . 

IJ 

The  estimates  obtained  by  the  above  method  may  still  require  large  samples.  Other 
methods  of  combining  the  equations  (2.15)  or  (2.16)  have  to  be  explored. 


3.  CLASSES  OF  DISTRIBUTIONS  FOR  C0IV1P0SIT0NAL  DATA 


3  1  Compositional  Gaussian  Distribution,  Type  1,  [CGD(l)] 

Let  X-  N  (li.Z)  and  define  Y  =  iZx  I  so  that  I  ZY  I  =  1.  We  call  the  distribution 

p  '  I '  '  I ' 

of  Y  on  the  set 


S  =  {Y  :  |Zy  I  =  1} 


the  Compositional  Gaussian  Distribution,  Type  1.  We  distinguish  two  sets 


S  =  {Y  :  Zy  =  1}.  S  =  {Y  ;  Zy  =  -1} 

+  I  -  I 


and  note  that  S  US  =  S  and 
+  - 


P(Y  6  S^)  =  P(Zx^  >  0)  and  P(Y  6  S_)  =  P(Ix^  <  0) . 


In  S  we  consider  the  transformation 


^  "  1, •••,?-! 


X  =  R(l-y  -...-V  ,)  =  Rv 

p  -'I  'P-1  'p 


(3.1) 


(3.2) 


(3.3) 


(3.4) 


and  in  S  , 


(3.5) 


Xp  =  R(l+y^+***+yp_j)  =  -Ryp. 

The  Jacobian  of  the  transformation  in  either  case  is 


D(x^,* 

D(R.yi 


rP-^ 


The  p.df.  of  X  in  transforms  to 

(  277 )  "P  /  ^  1  Z  T^rP"^  exp  [ -2"  ^  (Q^R^ -2Q  ^R -K)  j  )  ] 


where 


(3.6) 


(3.7) 


“  Y'E"^Y,  Q2  =  u’r^Y,  =  u’E"^U. 

1/2 

Making  the  transformation  r  =  RQ  ,  the  expression  (3.6)  changes  to 

3 

..  -p/2  I  _  I  ”  1 /2^'“p/2  f-  ^”1,^  ^2^-V- 

(2Tr)  |Z|  exp [-2  (Q^-Q^Q^  )] 

X  rP"’exp[-2"\r-Q  (3.8) 

Integrating  out  with  respect  to  r  from  0  to  «,  the  p.df.  of  Y  in  with  respect  to  the 

volume  element  dy  ...dy  is 
^1  'p-1 

(2u)“P^^  I  Z I  ■*^Q"P^^Ip  (Q2Q3S  exp  [-2“^  ^  ^ 

where  I  (a)  is  as  defined  in  (2.5). 

p 

In  S  y  under  the  transformation  (3.4),  the  expression  corresponding  to  (3.8)  is 


9 


12 I Z I  “  [-2"  ’  (Q  -Q^Q*  ’)] 

X  |r|P‘'exp[-2‘^r+Q2Q‘’'V].  (3.10) 

Integrating  out  with  respect  to  r  from  -®  to  0,  we  obtain  the  same  expression  as  in  (3.9) 
for  the  pdf  of  Y  in  S  . 

It  may  be  noted  that  in  the  expression  (3.9).  we  can  impose  a  suitable  restriction  on 
y  to  make  the  parameters  identifiable. 


3.2  Compositional  Gaussian  Distribution,  Type  2,  [CGD(2)] 


Let  X'^'N  (y,Z)  and  define  Y  =  (Zx)  For  this  we  consider  the  transformation 

p  I 


=  Ry^,  i  •= 

Xp  =  =  RYp  _  (3.11) 

so  that  2y.  =  1-  We  define  the  marginal  distribution  of  Y  on  the  simplex 

S  =  {Y  :  Zy.  =  1} 

as  the  Compositional  Gaussian  Distribution,  Type  2.  Making  the  transformation  (3.11), 
proceeding  as  in  Section  3.1  and  integrating  tiie  expression  corresponding  to  (3.8)  with 
respect  to  r  from  -®  to  «  we  obtain  the  pdf  of  Y  with  respect  to  the  volume  element 
dy,-dy^_,  .s 


10 


(217)  ‘iEr^Q3*’''^[Ip(Q2Q^)  +  Ip(-Q2Q3^)l®^Pt-2"^(Ql-Q2Q3'^^ 


-  -> 


where  Q  ,  Q  and  Q  are  as  in  (3.7)  and  I  is  as  defined  in  (2.5). 

12  3  p 

As  in  the  other  cases,  the  pdf  remains  unchanged  if  )jl  and  1  are  replaced  by  a  y 

2 

and  a  Z  respectively  for  any  scalar  a  >  0. 


As  in  (2.7),  the  conditional  distribution  of  Y  given  r  =  <  (a  constant)  is 


const  Q  ^^^exp(<Q  Q  ^^^) 
3  2  3 


(3.13) 


which  could  be  used  as  a  probability  model  for  compositional  data 

If  ve  define  Y  *  x  ^X,  then  the  distribution  of  ,  is  the  same  as 

p  ’  •'I*  '■'p-l 

in  (3.13).  In  the  computation  of  and  Q^,  we  substitute  the  value  1  for  y^. 
natural  way  of  normalizing  the  parameters  p  and  I  is  to  consider 


y  y  and  y  Z. 

P  P 


(3.14) 


3.3  Logistic  Gaussian  and  Related  Distributions 


Let  Y  be  a  vector  of  non-negative  components -y^^  ,?;•  ,y  such  that  Jy^  =  1, 
one  possible  model  which  has  been  studied  in  detail  is  the  logistic  Gaussian 
distribution  which  assumes  that 


X’  -  (Xj,**»,Xp_p  -  (log  yjyp^»**,iog  yp_iyp^)  (3.15) 


has  a  (p- 1  )-variate  Gaussian  distribution  (see  Aitchison  •  and  Shen  (1980),  Aitchison  (1982)). 
In  such  a  case  the  pdf  of  Y  can  be  written  in  the  form 


where  X  can  be  expressed  in  terms  of  Y  as  in  (3.15). 


In  building  a  model  for  Y  we  could  have  used  other  transformations  from  the  basic 
(p-1)  dimensional  Gaussian  variable  X  such  as 

X  =  X  C(y /y  )~1]/  i  =  (3.17) 

I  I  p 

which  is  the  Box  and  Cox  (1964)  transformation,  or  more  generally  any  appropriate 
transformation 

X  =  h(Y),  i  =  (3,18) 

i  i 


suggested  by  data 

A  well-known  distribution  for  compositional  data  with  non-negative  proportions  is 
the  Dirichlet  class  D^lg)  with  the  typical  density  function 

(3.19) 

where 

A(B)  =  r(B^)-*«r(6p)/r(Bj+---+6p). 

Aitchison  (1985)  considered  a  mixture  of  a  Dirichlet  and  a  logistic  Gaussian  distributions, 
but  imposing  some  relationship  between  the  parameters  y  and  g  to  reduce  the  number 
of  free  parameters  in  the  model.  He  also  provided  a  computational  procedure  for 
obtaining  the  maximum  likelihood  estimators  of  the  parameters  in  such  a  mixture  of 
distributions. 


-1 

CA(B)]  y^  ...  y^ 


4,  ESTIMATION  OF  THE  DENSITY  FUNCTION 


Let  Y,,***,Y  be  independent  observations  on  a  random  variable  Y  defined  on  f2  , 
in  p 

the  p-dimensional  unit  sphere.  If  a  suitable  model  for  the  distribution  of  Y  is  not 


available,  we  may  use  non-parametric  methods  and  estimate  its  p.d.f,  based  on  Y^^,** 


For  this  purpose  we  define  a  window  function  defined  on  which  is  indexed  by  two 


parameters  x  and  6,  x  e  tip  and  0^6  <  tr/2, 


(■  1  if  x’Y  ^  cos  0 , 


V  0  dtherwise. 


(4.1) 


The  set  of  points  Y  satisfying  the  first  equation  in  (4. 1 )  defines  a  cup  on  n 
central  point,  whose  area  is 


p 


with  X  as  a 


a(9) 


r(^^) 


J 

Jo 


sin  \p  d  Tp, 


The  number  of  points  falling  on.  this  cup  is 


I 

7  X.U  I 


.  (4.2) 


(4.3) 


By  choosing  a  small  value  of  0  =  0^  an  estimate  of  the  p.d.f.  of  Y  at  x  may  be  obtained 
as 


p  (X)  =  n”’  y[a(0  (Y). 

n  “  n  x,0  I 


(4;4) 


More  generally  we  could  use  any  suitable  p.d.f.  on  n  as  a  window  function.  In 

P 

particular  we  suggest  the  use  of  the  Longevin  density  (2.8) 

c(<^)exp(x'Y/ic^)  (4.5) 


and  estimate  the  p.d.f.  of  Y  as 


P„(1C) 


exp(x’Y^/K^) . 


(4.6) 


We  can  choose  by  the  method  of  Hebbema  (1974)  as  the  value  <  at 

which  the  pseudo-likelihood 


H  (n-l)“^  1  c(k)  exp(Y'Y  /k)  (4.7) 

i=l  zH  ^  ^ 


is  maximized.  Further  work  on  density  estimation  will  be  reported  elsewhere 
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